The Role of Models in Predictive Validation
نویسنده
چکیده
Model choice and validation have a central role in data analysis, including predictive modeling. While standard diagnostics can help identify model inadequacies, it is natural to use predictive accuracy as the decisive criterion in the final choice of predictive model. A key point is that any assessment of predictive accuracy, theoretical or empirical, inevitably assumes a “data mechanism”, i.e., a sampling or other stochastic model that relates model predictions to the population that is the target for predictions. In a controversial paper Breiman [3] presents predictive accuracy as an obvious and natural criterion for model assessment. He criticises a statistical culture that has almost exclusively used models that assume a stochastic “data mechanism” that is thought to describe underlying scientific processes. Breiman argues for the wider use of algorithmic models, such as tree-based regression and neural nets, that “treat the data mechanism as unknown”. Disregard of data mechanisms has limits. Simple approaches to predictive validation assume, in effect, that the data are a random sample from the population to which predictions will be applied. This is often inappropriate. Cox [3, following comment] notes that predictions are often applied “under quite different conditions from the data”. Indeed, the conditions may be so different that a realistic assessment of predictive accuracy is impossible! In what follows I will (1) comment briefly on algorithmic models; (2) note that the interest is, in some contexts, in model parameters; (3) comment in more detail on predictive accuracy; (4) discuss implications for data mining and for the use of data bases.
منابع مشابه
طراحی شبکه عصبی مصنوعی برای پیشبینی توأم سندرم متابولیک و شاخص مقاومت به انسولین (HOMA-IR): مطالعه قند و لیپید تهران
Background & Objective: Mixed outcomes arise when, in a multivariate model, response variables measured on different scales such as binary and continuous. In a bivariate modeling, when there are mixed response variables, the common methods in classic statistics have shortcomings. This study aimed at designing an appropriate ANN model for modeling and predicting the bivariate mixed responses i...
متن کاملValidation and application of empirical shear wave velocity models based on standard penetration test
Shear wave velocity is a basic engineering tool required to define dynamic properties of soils. In many instances it may be preferable to determine Vs indirectly by common in-situ tests, such as the Standard Penetration Test. Many empirical correlations based on the Standard Penetration Test are broadly classified as regression techniques. However, no rigorous procedure has been published for c...
متن کاملLogic regression and its application in predicting diseases
Regression is one of the most important statistical tools in data analysis and study of the relationship between predictive variables and the response variable. in most issues, regression models and decision tress only can show the main effects of predictor variables on the response and considering interactions between variables does not exceed of two way and ultimately three-way, due to co...
متن کاملQuantitative Structure Activity Relationship Analysis of Coumarins as Free Radical Scavengers by Genetic Function Algorithm
The antioxidant properties of coumarin derivatives using the 2,2ˈ -diphenyl-1- picrylhydrazyl (DPPH) radical scavenging assay were investigated by the application of Quantitative Structure Activity Relationship (QSAR) studies. The molecular structures were optimized and submitted for the generation of quantum chemical and molecular descriptors. Genetic Function Algorithm (GFA) was employed in m...
متن کاملQSAR models to predict physico-chemical Properties of some barbiturate derivatives using molecular descriptors and genetic algorithm- multiple linear regressions
In this study the relationship between choosing appropriate descriptors by genetic algorithm to the Polarizability (POL), Molar Refractivity (MR) and Octanol/water Partition Coefficient (LogP) of barbiturates is studied. The chemical structures of the molecules were optimized using ab initio 6-31G basis set method and Polak-Ribiere algorithm with conjugated gradient within HyperChem 8.0 environ...
متن کاملA comparative QSAR study of aryl-substituted isobenzofuran-1(3H)-ones inhibitors
A comparative workflow, including linear and non-linear QSAR models, was carried out to evaluate the predictive accuracy of models and predict the inhibition activity of a series of aryl-substituted isobenzofuran-1(3H)-ones. The data set consisted of 34 compounds was classified into the training and test sets, randomly. Molecular descriptors were selected using the genetic algorithm (GA) as a f...
متن کامل